41 research outputs found
A Variational Perspective on Accelerated Methods in Optimization
Accelerated gradient methods play a central role in optimization, achieving
optimal rates in many settings. While many generalizations and extensions of
Nesterov's original acceleration method have been proposed, it is not yet clear
what is the natural scope of the acceleration concept. In this paper, we study
accelerated methods from a continuous-time perspective. We show that there is a
Lagrangian functional that we call the \emph{Bregman Lagrangian} which
generates a large class of accelerated methods in continuous time, including
(but not limited to) accelerated gradient descent, its non-Euclidean extension,
and accelerated higher-order gradient methods. We show that the continuous-time
limit of all of these methods correspond to traveling the same curve in
spacetime at different speeds. From this perspective, Nesterov's technique and
many of its generalizations can be viewed as a systematic way to go from the
continuous-time curves generated by the Bregman Lagrangian to a family of
discrete-time accelerated algorithms.Comment: 38 pages. Subsumes an earlier working draft arXiv:1509.0361
Sufficient Conditions for Uniform Stability of Regularization Algorithms
In this paper, we study the stability and generalization properties of penalized empirical-risk minimization algorithms. We propose a set of properties of the penalty term that is sufficient to ensure uniform ?-stability: we show that if the penalty function satisfies a suitable convexity property, then the induced regularization algorithm is uniformly ?-stable. In particular, our results imply that regularization algorithms with penalty functions which are strongly convex on bounded domains are ?-stable. In view of the results in [3], uniform stability implies generalization, and moreover, consistency results can be easily obtained. We apply our results to show that â p regularization for 1 < p <= 2 and elastic-net regularization are uniformly ?-stable, and therefore generalize
Convergence in KL Divergence of the Inexact Langevin Algorithm with Application to Score-based Generative Models
We study the Inexact Langevin Algorithm (ILA) for sampling using estimated
score function when the target distribution satisfies log-Sobolev inequality
(LSI), motivated by Score-based Generative Modeling (SGM). We prove a long-term
convergence in Kullback-Leibler (KL) divergence under a sufficient assumption
that the error of the score estimator has a bounded Moment Generating Function
(MGF). Our assumption is weaker than (which is too strong to hold in
practice) and stronger than error assumption, which we show not
sufficient to guarantee convergence in general. Under the error
assumption, we additionally prove convergence in R\'enyi divergence, which is
stronger than KL divergence. We then study how to get a provably accurate score
estimator which satisfies bounded MGF assumption for LSI target distributions,
by using an estimator based on kernel density estimation. Together with the
convergence results, we yield the first end-to-end convergence guarantee for
ILA in the population level. Last, we generalize our convergence analysis to
SGM and derive a complexity guarantee in KL divergence for data satisfying LSI
under MGF-accurate score estimator.Comment: 36 page